English-French Verb Phrase Alignment in Europarl for Tense Translation Modeling

نویسندگان

  • Sharid Loáiciga
  • Thomas Meyer
  • Andrei Popescu-Belis
چکیده

This paper presents a method for verb phrase (VP) alignment in an English/French parallel corpus and its use for improving statistical machine translation (SMT) of verb tenses. The method starts from automatic word alignment performed with GIZA++, and relies on a POS tagger and a parser, in combination with several heuristics, in order to identify non-contiguous components of VPs, and to label the aligned VPs with their tense and voice on each side. This procedure is applied to the Europarl corpus, leading to the creation of a smaller, high-precision parallel corpus with about 320 000 pairs of finite VPs, which is made publicly available. This resource is used to train a tense predictor for translation from English into French, based on a large number of surface features. Three MT systems are compared: (1) a baseline phrase-based SMT; (2) a tense-aware SMT system using the above predictions within a factored translation model; and (3) a system using oracle predictions from the aligned VPs. For several tenses, such as the French imparfait, the tense-aware SMT system improves significantly over the baseline and is closer to the oracle system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Managing Verb Phrase Effective and Easy English-Hindi Machine Translation

Automatic Machine Translations from one to another language have been the subject of great attention of computational linguistics for many years. In EnglishHindi Machine Translation, verb tuning is a vital operation. Present paper is an approach to describe easy English-Hindi verb phrase mapping. This work results satisfactory in Machine Translation over type of English sentences. It is observe...

متن کامل

Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling

We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...

متن کامل

Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling

We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...

متن کامل

Modeling verbal inflection for English to German SMT

German verbal inflection is frequently wrong in standard statistical machine translation approaches. German verbs agree with subjects in person and number, and they bear information about mood and tense. For subject–verb agreement, we parse German MT output to identify subject–verb pairs and ensure that the verb agrees with the subject. We show that this approach improves subject-verb agreement...

متن کامل

Cross-linguistic annotation of narrativity for English/French verb tense disambiguation

This paper presents manual and automatic annotation experiments for a pragmatic verb tense feature (narrativity) in English/French parallel corpora. The feature is considered to play an important role for translating English Simple Past tense into French, where three different tenses are available. Whether the French Passé Composé, Passé Simple or Imparfait should be used is highly dependent on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014